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Abstract 

Background: Cave animals converge evolutionarily on a suite of troglomorphic traits, the best known of which are 
eyelessness and depigmentation. We studied 1 1 cave and 10 surface populations of Astyanax mexicanus in order to 
better understand the evolutionary origins of the cave forms, the basic genetic structuring of both cave and 
surface populations, and the degree to which present day migration among them affects their genetic divergence. 

Results: To assess the genetic structure within populations and the relationships among them we genotyped 
individuals at 26 microsatellite loci. We found that surface populations are similar to one another, despite their 
relatively large geographic separation, whereas the cave populations are better differentiated. The cave populations 
we studied span the full range of the cave forms in three separate geographic regions and have at least five 
separate evolutionary origins. Cave populations had lower genetic diversity than surface populations, correlated 
with their smaller effective population sizes, probably the result of food and space limitations. Some of the cave 
populations receive migrants from the surface and exchange migrants with one another, especially when 
geographically close. This admixture results in significant heterozygote deficiencies at numerous loci due to 
Wahlund effects. Cave populations receiving migrants from the surface contain small numbers of individuals that 
are intermediate in both phenotype and genotype, affirming at least limited gene flow from the surface. 

Conclusions: Cave populations of this species are derived from two different surface stocks denoted "old" and 
"new." The old stock colonized caves at least three times independently while the new stock colonized caves at 
least twice independently. Thus, the similar cave phenotypes found in these caves are the result of repeated 
convergences. These phenotypic convergences have occurred in spite of gene flow from surface populations 
suggesting either strong natural or sexual selection for alleles responsible for the cave phenotype in the cave 
environment. 



Background 

The mechanisms underlying the evolution of convergent 
phenotypes in independent natural populations pose 
long-standing questions in evolutionary biology. The 
extent to which convergent or parallel changes draw on 
preexisting genetic variation in ancestral populations 
versus new mutations is still debated [1,2]. The molecu- 
lar and genetic changes that underly most convergences 
are still unknown. 
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Convergence is also of interest to evolutionists 
because it provides an element of replication to evolu- 
tionary studies that is often otherwise absent. Replica- 
tion allows for the powerful testing of evolutionary 
hypotheses. Cave-dwelling organisms provide the best 
known examples of convergences, sharing similar phe- 
notypes such as loss of eyes and pigmentation across 
diverse taxonomic groups [3-5]. 

The Mexican blind cavefish (Astyanax mexicanus) is 
nearly unique among cave animals because the cave 
forms have closely related surface conspecifics and the 
two forms are fully interfertile [6]. The ability to hybri- 
dize the cave and surface forms permits the genetic ana- 
lysis of the factors involved in cave adaptation. There 
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are 29 known cave populations of this species dispersed 
over three geographically distinct areas, thus this group 
may contain multiple examples of convergence [7]. 

Each population inhabits a food and light restricted 
cave environment. Members of these populations exhibit 
numerous cave-related evolutionary traits, including 
reduction in pigment and eye size, hypertrophy of non- 
optic sensory organs, increased condition factor, and 
robust patterns of reduced sleep; presumably all are 
evolved in response to perpetual darkness and reduced 
food availability [6-10]. 

Thus, the cave colonizations of Astyanax populations 
provide replicates of an excellent "natural experiment" 
which allows us to address important evolutionary ques- 
tions, including the extent to which evolutionary changes 
in morphology, behavior and physiology are driven by 
selection versus drift [11-13]. These two alternatives can 
be distinguished in a number of ways in this system, but 
any determination will require an understanding of the 
underlying demography of the populations as well as a 
clarification of the relationships among them. 

Previous phylogeographic studies of Astyanax cavefish, 
using microsatellite and mtDNA, showed that the cave 
populations are derived from at least two different sur- 
face stocks that inhabited the Sierra de El Abra and 
nearby regions in succession [14-17]. The estimates 
from mtDNA suggest that these two groups diverged 
about 6.7 Mya [15]. Surface forms of the older stock ori- 
ginally inhabited the rivers in the El Abra region and 
were the likely ancestors of a series of cave populations, 
which we designate as "old." Subsequently, the surface 
fish of the old stock went locally extinct. The region 
was then invaded by another stock of A. mexicanus, 
which gave rise to the current surface populations and a 
second set of cave populations we designate as "new" 
(Figure 1) [14,16,17]. 

While previous studies revealed that the extant cave 
populations were derived from a minimum of two 
ancestral stocks, there may have been more. In addition, 
the question of how many independent invasions of the 
underground led to the present day Astyanax cave fauna 
remains unanswered. To understand the demographic 
component of the phenotypic evolution we studied cave 
populations from the full extent of their known distribu- 
tion. We give a detailed description of genetic differen- 
tiation in multiple cave populations and their 
relatedness with surface morphs, and estimate effective 
population sizes and the rates of gene flow among select 
populations based on multiple independent markers. 

Methods 

Sampling 

All fish specimens were collected in March 2008 and 
preserved in 70% ethanol. A total of 568 Astyanax 



samples were taken from 11 cave and 10 surface loca- 
tions. The cave populations sampled can be divided into 
three geographically distinct regions, which span the full 
geographic range of the cave forms. The first is the 
Sierra de El Abra cave cluster, which is the most exten- 
sive of the regions. To its west is the second region, 
Micos, on the Western slope of the Sierra de Colmena 
and finally, to the north of the El Abra, is the Sierra de 
Guatemala region. 

The El Abra cave cluster is represented in our study by 
eight caves (from North to South, Ol to 08): Pachon, Yer- 
baniz, Japones, Arroyo, Tinaja, Curva, Toro, and Chica, 
respectively. In the Sierra de Guatemala we sampled two 
caves, Molino (Nl) and Caballo Moro (N2) and in the 
Micos area we sampled one of three closely clustered 
caves (Rio Subterraneo, referred to below as Micos or N3). 
An overview of the geographical distribution of the sam- 
pling area of cave and surface locations is presented in Fig- 
ure land the locality abbreviations are shown in Table 1. 

DNA extraction and genotyping were done according 
to Protas et al. (2006). All samples were profiled at 26 
microsatellite markers with primers previously devel- 
oped for QTL studies [10]. The forward primer of each 
pair was labeled at the 5' end with a fluorescent dye 
(HEX or FAM) and microsatellite amplification products 
were visualized on an ABI 3730 automated DNA 
sequencer. Microsatellite markers were optimized for 
the allelic range and multiplexed. Allele sizes were 
scored using v3.7 (ABI). We used unlinked markers 
selected from independent linkage groups, or markers 
so distant as to assort independently if within the same 
linkage group [9]. In addition, all 26 of the markers 
(Additional File 1) were chosen such that they were out- 
side the previously identified QTL regions [9,10,12]. 

The database was optimized for analysis using the 
coalescent-based program Migrate-N 3.2.6. [18-20]. The 
accuracy of estimates based on the coalescent approach 
depends more on number of independent loci than on 
sample size [21]. Thus, our choice of 26 independently 
assorting loci, biased towards neutrality, maximized the 
amount of information per unit effort we could extract 
from the analyses. Furthermore, parameter estimates 
from Migrate-N 3.2.6 are uninfluenced by missing data. 
Therefore, we examined all GENEMAPPER calls by 
hand to verify their validity. Amplifications that were 
too weak to resolve the peaks or had extra peaks were 
reamplified and rerun to resolve the problem. Any 
remaining unresolved were treated as missing data. The 
overall data set has approximately 20% missing data, but 
gives unbiased estimates of parameters based on 
Migrate-N 3.2.6. In order to check for the existence of 
null alleles, and to evaluate their impact on the estima- 
tion of genetic differentiation we used the program 
MICRO-CHECKER v2.23 [22]. 
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Figure 1 Map of the Sierra de El Abra region showing all the cave and surface collection sites Colored boxes delineate major 
geographical regions (labeled below), as following: El Abra region: 01 - 08 (blue & green circles); Guatemala region: N1 - N2 (red circles); Micos 
region: N3 (purple circles); Surface localities S1 - S4 (yellow circles). Light blue lines represent different river systems in the area. 



Genetic diversity 

We calculated observed (H Q ) and unbiased expected 
(H e ) heterozygosities [23], number of alleles, and the 
number of alleles standardized for the smallest sample 
size for single populations and for the geographic 
groups. These descriptive statistics were performed in 
Genepop v 4.0 [24] and Microsatellite Analyzer (MSA) 
[25]. Deviations from HWE were estimated using both 
the exact test and the F IS statistic estimations, using 
Markov chain Monte Carlo (MCMC) runs for 1000 



batches, each of 2000 iterations, with the first 500 itera- 
tions discarded before sampling [26]. Whenever multiple 
testing was performed, probability values were corrected 
using standard Bonferroni corrections [27]. 

Population structure analysis and differentiation 

The program STRUCTURE 2.3.3 [28] was used to infer 
historical lineages through clustering of similar geno- 
types. The admixture model of STRUCTURE and the 
option of correlated allele frequencies between 



Bradic et al. BMC Evolutionary Biology 2012, 12:9 
http://www.biomedcentral.eom/1471-2148/12/9 



Page 4 of 16 



Table 1 Sample information and descriptive statistics summary of the sampled populations. 
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N = sample size per population; n = mean sample size over all loci; P = proportion of polymorphic loci; A = mean number of alleles per locus; Ar = mean 
number of alleles standardized to the smallest sample. H e = unbiased expected heterozygosity standardized according to the (2N/2N-1)*H e ) formula; H D = 
observed heterozygosity, Lat is the latitude (N) and Long is the longitude (W). 



populations were used. The correct number of clusters 
(K) was determined by testing K values from 1 to 12 
and performing 10 repeats for each K. The burn-in per- 
iod consisted of 1 x 10 6 iterations followed by 1 x 10 5 
MCMC repeats. Finally, estimated log probabilities of 
data Pr (X | K) for each value of K were evaluated by 
calculating AK, the rate of change in the log probability 
of data between successive K values [29]. We also esti- 
mated population structure independently with STRUC- 
TURAMA [30] in order to test the STRUCTURE 
inferences. 

While these clustering methods can be quite powerful, 
particularly when there is a high divergence between 
populations [31], they often make explicit assumptions 
of demographic history and sometimes are difficult to 
interpret without background biological information. 
Thus, we complemented the Bayesian analysis using 
other methods to more directly estimate relationships 
among populations. The proportions of shared alleles 
between populations were calculated in the R package 
adegenet 1.2-2 using the propShared function [32] 
where the average proportions of shared alleles among 
and within populations are computed over all possible 
combinations of individuals sampled. The distance 



matrix based on the proportion of shared alleles was 
then transformed into a matrix of Euclidean distances 
using the quasieuclid function. 

Private allele estimates and allele richness were calcu- 
lated, grouping the independent geographical regions 
obtained by clustering methods. In order to estimate rari- 
fied allelic richness and private rarified allelic richness, the 
rarified method in HP-RARE [33] was used to control for 
the correlation between observed allelic diversity and sam- 
ple size [34]. The alleles were rarified to a sample size of 
40, the smallest sample size of our population groups. 

In order to estimate the variance between the groups 
of populations, pooled sample structuring was estimated 
using analysis of molecular variance (AMOVA) [35] and 
20,000 permutations implemented in Arlequin v 3.5.1.2 
[36]. Missing data as observed in our study could influ- 
ence AMOVA results. Thus, locus-by-locus AMOVA 
was used to adjust the sample sizes for each locus and 
the point estimators of variance components to estimate 
F-statistics more accurately [35]. Influences of long-term 
separation and genetic drift were measured by compara- 
tive methods of allelic frequency tests for all population 
combinations using F sx pairwise estimates [37] as imple- 
mented in MICROS ATELLITE ANALYSER (MSA) [25]. 
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Migration patterns between populations 

The coalescence-based program MIGRATE-N 3.2.6. 
[18-20] was used to test for and estimate gene flow 
between populations. Three migration models were eval- 
uated: (1) a full model with two population sizes and 
two migration rates (in and out of the caves); (2) a 
model with two population sizes and one migration rate 
(gene flow into the caves); (3) a model with two popula- 
tion sizes and one migration rate (gene flow out of the 
caves). 

MIGRATE-N 3.2.6 [18-20] also estimated the muta- 
tion-scaled effective population size 0 = 4Neu, where 
N e is the effective population size and u is the mutation 
rate per generation per locus, as well as mutation-scaled 
migration rates M = m/u, where m is the immigration 
rate per generation among populations. The model 
comparison was done using Bayes factors that need the 
accurate calculation of marginal likelihoods. These likeli- 
hoods were calculated using thermodynamic integration 
in MIGRATE-N 3.2.6 [20] (Additional File 2). 

Mantel Test 

We tested for isolation by distance among populations 
and sample locations comparing genetic distance (F st /(1- 
F st )) versus straight line geographic distance by applica- 
tion of the Mantel test as implemented in GENALEX 
[38] (999 permutations, significance level p < 0.01). 

Eye size measurements 

In order to test whether variation in eye rudiment size 
was correlated with genotype in the caves with phenoty- 
pically mixed populations, we analyzed digital images of 
individuals from three cave populations (N2 (n = 26), 
N3 (n = 72) and 08 (n = 119). Photos were taken in the 
lab using a digital camera with the fish placed on a Car- 
tesian coordinate grid. Measurements were made using 
Image J (NIH). In order to correct for individual size dif- 
ferences, relative eye size was standardized as a propor- 
tion of standard body length [39]. 

Results 

Genetic Diversity 

We calculated descriptive statistics using 26 unlinked 
microsatellite markers. The number of alleles and pro- 
portions of polymorphic loci were generally higher in 
surface than in cave populations, although there was 
considerable variability among populations (Table 1). 
Genetic variability was significantly lower in the cave 
populations than in the surface (Table 1). Average allelic 
number (Ar) ranged from 2.25 ± 0.50 in the Guatemala 
region (Nl, N2) and 2.54 ± 0.26 in the El Abra (Ol to 
08), to a high of 3.63 ± 0.14 in the surface populations. 
Surface Ar was significantly greater than Ar in the Gua- 
temala and in the El Abra (t 16 = 11.6, t 10 = 8.7, 



respectively, P < 10" 6 for both). The Micos cave popula- 
tion (N3) had an intermediate average number of alleles 
per locus (2.98); previous studies have shown this popu- 
lation to contain both cave and surface-dwelling pheno- 
types [7,16]. We also detected monomorphic loci 
(NYU26, 26C, 218A, 213B), which shared the same 
alleles among El Abra cave populations (Additional File 
3). Unbiased expected heterozygosities (H e ) were also 
higher and significantly different (0.82 ± 0.04) in surface 
populations than in the El Abra (0.55 ± 0.07; t 16 = 10.8, 
P < 10" 6 ) or Guatemala (0.49 ± 0.13; t 10 = 8.7, P < 10" 5 ) 
populations, while the Micos population (N3) exhibited 
intermediate heterozygosities of 0.66 (Table 1). 

Genotypic frequencies 

We tested all the loci used in the study for the presence 
of non-amplifying alleles (null alleles) as described in 
MICRO-CHECKER v2.23 (Van Oosterhout et al. 2004). 
This method is mostly based on significant heterozygote 
deficits relative to HWE (Van Oosterhout et al. 2004). 
However biological factors such as Wahlund effect or 
inbreeding might be easily misconstrued as evidence for 
null alleles (Chakraborty et al. 1992). Null alleles seem 
to be present mostly in the Micos and Chica popula- 
tions (data not shown). If, however, null alleles were the 
cause of the lack of heterozygotes, we would expect 
them to be equally represented in all the other popula- 
tions, as well as locus-specific; this we did not observe. 
Thus, loci out of HWE owing to heterozygous deficit 
seem to be caused not by a technical artifact such as 
null alleles, but rather by population genetic phenom- 
ena. Only one locus (213B) was out of HWE in many 
populations (9 out of 21) (Additional File 3). Thus, data 
were analyzed both including and excluding this locus. 
This made no difference, so the locus was retained. 

We performed 519 tests (27 values were excluded due 
to missing or monomorphic data) and detected 71 sig- 
nificant departures from HWE (based on 0.05 level of 
significance and standard Bonferroni corrections). Most 
of the significant loci showed heterozygote deficiency 
characterized by a positive F IS value. Heterozygote 
excess was detected in a few populations, mostly surface, 
for five loci (214D, 210A, 202D, 104A and 241B) (Addi- 
tional File 3). Populations previously described in the lit- 
erature [7,40], as phenotypically mixed, 08 and N3, 
presented significant deviations from HWE at numerous 
loci: 13 and 16 out of 25 loci scored, respectively (Addi- 
tional File 3). Presumably, this reflects population subdi- 
vision. The other cave populations exhibited only small 
numbers of loci out of HWE and these differed from 
one population to the next (Additional File 3). One 
locus (213B) was out of HWE in many populations (9 
out of 21) (Additional File 3), which may reflect the pre- 
sence of null alleles at this locus. 
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Population structure analysis and differentiation 

As a starting point to infer the relationships among 
populations we used the clustering algorithm implemen- 
ted in the program STRUCTURE [28]. We explored dif- 
ferent numbers of populations K to uncover hierarchical 
population structure (Figure 2A). The clear distinction 
among the two groups when K = 2 is consistent with 
the hypothesis that all of the populations we studied ori- 
ginated from two stocks: a "new" stock including pre- 
sent-day surface-forms and the "new" cave populations 
from the Micos (N3) and Guatemala regions (Nl, N2), 
and an "old" stock including the El Abra cave popula- 
tions (01-08) and their locally extinct progenitors 
[14-17]. Further structuring represents divergence of 08 
from the other El Abra populations at K = 3, the more 
recent divergence between the surface populations (SI - 
S4) and the new cave populations (Nl - N3) at K = 4, 
and the separate origins of the new cave populations at 
K = 5. Optimal K [29] estimated the most likely number 
of populations at K = 5 (Figure 2B, C). We performed a 
STRUCTURAMA analysis, which estimated the same 
value of K = 5 (posterior probability of 90%; results not 
shown). 

These five independent population groups are: 1) El 
Abra caves (01-07), 2) El Abra cave mixed population 
(08), 3) the new cave populations to the north in the 
Guatemala region (Nl, N2), 4) the new cave mixed 
population in the southwest Micos region (N3) and 5) 
the surface populations (Figure 2A and 2C). The 
STRUCTURE analysis also revealed that four of the cave 
populations (N3, 08, N2 and 02) contained alleles from 
surface populations at several loci while the surface 
populations showed a smaller number of the alleles 
from the caves. 

We further tested the genetic distances among popu- 
lations using the metric of shared alleles. Figure 3A 
illustrates that the entire El Abra cluster is the furthest 
away from the cluster of the "new" caves (Nl, N2 and 
N3). Genetically, one El Abra cave population (08) was 
equidistant from the "old" and "new" lineages, while the 
Micos (N3) cave population shared the most alleles with 
surface populations (Figure 3A). 

Private allele estimates were calculated based on 
groupings of populations united by geographical proxi- 
mity, which also corresponded well to the groupings 
revealed both by STRUCTURE and the shared allele dis- 
tance analysis. The private allele content is significantly 
higher in surface populations than in cave populations 
(Figure 3B) (Kruskal-Wallis H test, df = 4, N = 125, H = 
32.8, P < 0.00001 for overall significance, post-hoc com- 
parisons revealed no significant differences among cave 
groups, but all cave groups differed from surface at P < 
0.001). The shared alleles and private allele proportions 
between surface and cave populations (Figure 3A and 



3B) suggests that the allelic contents of cave populations 
are largely subsets of alleles of the surface stock. Thus 
the observed variation in the caves is mostly the result 
of standing genetic variation from the ancestral surface 
stock as well as possible gene flow between the popula- 
tions (Figure 3B). However, this result needs to be put 
into perspective: since the ancestors of the 'old cave fish' 
in the Sierra El Abra are locally extinct, one cannot 
compare their allelic composition with that found in 
todays El Abra cave populations. 

In order to determine genetic structuring in the ana- 
lyzed samples, we performed hierarchical AMOVA ana- 
lysis (Table 2). First, we narrowed down the population 
structuring by grouping populations based on their ori- 
gins, "old" vs.'new" [14-17]. Comparison of the El Abra 
populations (01-08) vs. the Guatemala and Micos caves 
(Nl - N3) pooled with surface populations (SI - S4) was 
significant (P < 0.0001) and explained 4.52% of the var- 
iance among groups. This supports the hypothesis emer- 
ging from the STRUCTURE analysis that two different 
stocks of surface fish were ancestral to the present day 
cave populations. We tested by AMOVA all inferences 
of structuring that emerged from the shared allele dis- 
tance and STRUCTURE analyses. The largest proportion 
of variance in all of the groups was within individuals 
(Table 2). We found no significant structure of the sur- 
face populations comparing the regions SI through S4 
(data not shown). Importantly, the AMOVA analyses 
supported significant structure among six metapopula- 
tions: 1) the El Abra cave populations (02 -07), 2) Gua- 
temala populations (Nl and N2), 3) Micos population 
(N3), 4) Chica population (08), 5) Pachon population 
(Ol) and 6) surface populations. 

Pairwise F sx comparisons of the geographically 
defined populations typically revealed higher divergences 
among cave populations, even within a geographical 
cluster, than between cave and surface populations 
(Table 3). F S t comparisons revealed less divergence 
among populations of the two Guatemala caves (Nl, 
N2) and Micos cave (N3) (F ST range from 0.23 to 0.36) 
than was seen in comparisons among caves of the El 
Abra cluster (F ST range from 0.20 to 0.51). This low 
genetic divergence is notable because the Sierra de Gua- 
temala and Micos caves are more than 100 kilometers 
(km) apart. Consistent with the AMOVA analyses, F ST 
values among surface populations were generally low 
(the highest F ST = 0.09), suggesting that many of these 
populations from multiple and distant geographical 
regions essentially have high levels of allelic exchange. 
On the basis of F ST values, general divergence between 
cave and surface pairs seems to be related to the level of 
the physical isolation of the particular caves from the 
surface water. Four cave populations (Ol, 06, Nl, and 
N2) show the highest F S t values against the surface 
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Figure 2 Estimated population structure of Astyanax cave and surface population using STRUCTURE (Pritchard et al. 2000) for K = 2 
and K = 5 population groups (Figure. 2). Each individual is represented by a thin vertical line, which is partitioned into K segments that 
represent its estimated population group membership fractions. Black lines separate individuals from geographical site locations (labeled below), 
which are as following: El Abra: 01 - 07; Chica (08); Guatemala: N1 - N2; Micos: N3: Surface: S1 - S4. Figure 2B. Mean posterior probabilities of 
ten runs for each K, K = 1 to K = 12. Figure 2C. K = 5 had the highest AK vs. K peak height [29]). 



populations (Table 3). The first three of these popula- 
tions are perched and thus isolated from the underlying 
aquifer, while the fourth is in an area with no perma- 
nent surface streams (Additional File 4). Results of a 
Mantel test for isolation by distance among populations 
Ol to 08 were positive, showing increasingly greater 
genetic isolation with increasing geographic distance 
(Additional File 5). 



As is the case with Ol, all seven F ST values between 
08 and the other old cave populations are significant. 
In fact, F ST analyses reveal that the 08 population is sig- 
nificantly diverged from every other population of cave 
or surface fish we surveyed (Table 3). The average F ST 
value between 08 and the seven other cave populations 
of the El Abra group (Ol to 07) was 0.230 ± 0.021 
(SEM), which was significantly higher than the average 
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Figure 3 Genetic variability in Astyanax mexicanus using 26 microsatellite loci. Figure 3A. Proportion of shared alleles (samples of likely 
common ancestry determined by shared alleles) between the studied populations shown as Euclidian distances, 95% confidence ellipses 
represent each population. Figure 3B. Private allelic richness averaged over geographically grouped populations. Populations are coded as 
follows: El Abra caves (01 - 07); Guatemala (N1 - N2); Micos (N3), Chica (08); Surface (S1 - S4). All bar plots represent mean ± SEM. Asterisk 
denotes that the surface group was significantly different than each of the other groupings at the P < 0.0001, as tested by Student's t. 
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Table 2 Analyses of molecular variance (AMOVA) in cave and surface populations for 26 microsatellite loci 



Structure tested 


SS 


VC 


%VAR 


Fstat 


P 


TWO GROUPS: 01-08 vs. S1-S4 + N1-N3 












Among groups 


372.71 


0.43 


4.52 


0.07 


< 0.000001 


Among populations within groups 


1288.50 


1.51 


15.76 


0.16 


< 0.000001 


Among individuals within populations 


3743.17 


0.58 


6.02 


0.04 


< 0.000001 


Within individuals 


3373.50 


7.06 


73.70 


0.26 


< 0.000001 


THREE GROUPS: 01-08 vs. N1-N3 vs. S1-S4 












Among groups 


560.21 


0.45 


4.73 


0.05 


< 0.000001 


Among populations within groups 


1 1 00.99 


1.41 


1.41 


0.16 


< 0.000001 


Among individuals within populations 


3743.17 


0.58 


6.08 


0.08 


< 0.000001 


Within individuals 


3373.50 


7.06 


74.31 


0.26 


< 0.000001 


FOUR GROUPS: 01-07 vs. N1-N3 vs. S1-S4 vs. 08 












Among groups 


575.99 


0.67 


6.95 


0.06 


< 0.000001 


Among populations within groups 


1085.21 


1.32 


13.75 


0.14 


< 0.000001 


Among individuals within populations 


3743.17 


0.58 


5.99 


0.07 


< 0.000001 


Within individuals 


3373.50 


7.06 


73.30 


0.26 


< 0.000001 


FIVE GROUPS: 01-07 vs. N1-N2 vs. S1-S4 vs. N3 vs. 08 












Among groups 


1034.04 


0.86 


9.05 


0.13 


< 0.000001 


Among populations within groups 


539.32 


1.20 


12.64 


0.09 


< 0.000001 


Among individuals within populations 


3191.16 


0.59 


6.19 


0.07 


< 0.000001 


Within individuals 


2852.00 


6.83 


72.11 


0.27 


< 0.000001 


FIVE GROUPS: 02-08 vs. N1-N2 vs. S1-S4 vs. N3 vs. 08 vs. 01 












Among groups 


1198.68 


1.18 


12.42 


0.12 


< 0.000001 


Among populations within groups 


374.68 


0.89 


9.39 


0.1 


< 0.000001 


Among individuals within populations 


3191.16 


0.59 


6.18 


0.07 


< 0.000001 


Within individuals 


2852.00 


6.83 


72.00 


0.28 


< 0.000001 



SS - Sum of squares; VC - Variance components; % VAR - Percentage of variation; Fstat = F-statistics; P = P values. 



Table 3 Multi locus pairwise F ST estimates from 26 microsatellite loci in Astyanax mexicanus. 



EL ABRA 



GUATEMALA MICOS 



SURFACE STREAMS 





04 


06 


03 


05 


07 


02 


01 


08 


N2 


N1 


N3 


S3 


S2 


SI 


S3 


06 


0.06 






























03 


0.19 


0.25 




























05 


0.06 


0.15 


0.21 


























07 


0.08 


0.12 


0.15 


0.08 
























02 


0.14 


0.23 


0.05 


0.16 


0.11 






















01 


0.28 


0.34 


0.31 


0.30 


0.34 


0.28 




















08 


0.22 


0.23 


0.26 


0.18 


0.16 


0.23 


0.33 


















N2 


0.35 


0.36 


0.33 


0.33 


0.31 


0.33 


0.41 


0.29 
















N1 


0.47 


0.51 


0.44 


0.47 


0.46 


0.46 


0.48 


0.37 


0.36 














N3 


0.24 


0.26 


0.26 


0.23 


0.20 


0.24 


0.31 


0.23 


0.23 


0.27 












S3 


0.16 


0.19 


0.15 


0.13 


0.08 


0.14 


0.23 


0.16 


0.19 


0.26 


0.11 










S2 


0.21 


0.26 


0.19 


0.20 


0.09 


0.18 


0.30 


0.18 


0.20 


0.33 


0.12 


0.00 








SI 


0.23 


0.30 


0.25 


0.20 


0.13 


0.22 


0.31 


0.20 


0.26 


0.39 


0.19 


0.05 


0.07 






S3 


0.18 


0.22 


0.18 


0.17 


0.10 


0.18 


0.26 


0.15 


0.22 


0.34 


0.11 


0.02 


0.03 


0.09 




S2 


0.19 


0.22 


0.17 


0.17 


0.10 


0.17 


0.25 


0.19 


0.20 


0.28 


0.13 


0.02 


0.01 


0.08 


0.04 


S3 


0.17 


0.21 


0.17 


0.15 


0.10 


0.15 


0.24 


0.15 


0.19 


0.25 


0.08 


0.01 


0.02 


0.06 


0.02 


S3 


0.17 


0.21 


0.16 


0.14 


0.09 


0.14 


0.25 


0.14 


0.19 


0.28 


0.11 


0.00 


0.01 


0.05 


0.02 


S3 


0.18 


0.22 


0.18 


0.15 


0.09 


0.16 


0.26 


0.16 


0.19 


0.28 


0.13 


0.00 


0.02 


0.03 


0.03 


SI 


0.18 


0.22 


0.18 


0.16 


0.11 


0.16 


0.25 


0.16 


0.20 


0.28 


0.13 


0.02 


0.03 


0.05 


0.03 


S4 


0.19 


0.22 


0.18 


0.16 


0.11 


0.17 


0.26 


0.17 


0.19 


0.26 


0.07 


0.03 


0.04 


0.09 


0.04 



SI 



0.04 
0.03 



0.01 
0.01 



0.00 

0.01 0.01 



Boldfaced values are significant after Bonferroni correction. 
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F ST values between 08 and the ten surface populations 
(average F ST = 0.166 ± 0.006; t n = 5.75, p < 0.0005). 

Effective population size and migration rates in Astyanax 
mexicanus 

Estimations of effective population sizes (N e ) and migra- 
tion rates among populations were performed with 
MIGRATE-N 3.2.6, using Bayesian inference and the 
Brownian motion mutation model. The model allows 
for mutation rates differing among loci by using the 
number of alleles per locus to estimate locus specific 
relative mutation rate modifiers. All the estimates of the 
mutation-scaled effective population size 0 were scaled 
using a microsatellite mutation rate of 5.56 x 10" 4 per 
locus per generation [41,42] to calculate the average 
effective population sizes (N e ). Effective population sizes 
varied among different surface clusters (N e from -1011 
to -5058) but were generally greater than in cave popu- 
lations (Figure 4). Estimates of N e in most cave popula- 
tions ranged from 831 (06) to 1326 (02) (Figure 4). 
However, the cave populations from which previous stu- 
dies reported mixed populations were again an excep- 
tion, with effective population sizes of 4159 in 08, 1326 
in 02, and 2360 in N3. We used the MIGRATE-N 3.2.6 
models [20] to test for gene flow among individual cave 
and surface populations, limiting our inquiry to nearby 
populations or adjacent cave clusters. The summary of 
all the models is in Figure 5 (Additional File 6 for the 
details). Our estimations of migration rates and effective 
population sizes supported the hypothesis that the 
genetic diversities of A. mexicanus cave populations are 
functions of introgression from surface populations, as 
well their effective sizes. 
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Figure 4 Estimates of effective population size (N e ) based on 
Bayesian inferences of migration rates and population sizes 
among Astyanax mexicanus population. The central box of the 
plots represents the values from the lower to upper quartile (25 to 
75 percentile). The middle dot represents the median posterior 
values over all loci. The horizontal line extends from the 2.5% 
percentile to the 97.5% percentile. The x-axis represents N e . 
Populations are coded as follows: El Abra caves (01 - 07); 
Guatemala (N1 - N2); Micos (N3), Chica (08); Surface (SI - S4). 



Migration rates between individual populations varied 
by several orders of magnitude and the rates between 
cave and surface populations exceed those between 
caves. This is in accord with calculated F ST values. Four 
different patterns of migration were observed: among 
surface populations, among cave populations, from cave 
to surface, from surface to cave. Migration rates among 
the four groups of surface populations defined earlier 
(SI - S4) were the highest we observed and were mostly 
symmetrical (Additional File 6). Migration rates between 
cave and surface populations were largely asymmetrical, 
with migration from the surface into caves typically 
greater than in the reverse direction. Micos (N3) cave 
and its nearby surface population was the only case in 
which migration rates in both directions were nearly 
equal, a result consistent with the STRUCTURE results. 
Migration rates among the cave populations were very 
low, except for caves in the El Abra or Guatemala clus- 
ters that are in close geographic proximity (02 - 03; 
04; 05 - 06; Nl - N2). Also, Nl - N3 seem to have 
more exchange of migrants with surface than with 
populations of the old cave cluster (Figure 5, Additional 
File 6). This suggests that proximate caves can exchange 
alleles through migration, although not nearly to the 
same extent as the surface populations exchange alleles. 

Considering only the Ol - 08, we see that migration 
rates decrease with increasing geographical distances 
among populations (Figure 5, Additional File 4). This 
observation supports the hypothesis of underground 
connections between nearby populations. Thus, Ol as 
the most geographically distant cave has the smallest 
influx from other cave populations of El Abra cluster, 
while 02 - 03 and 04 - 05 show high gene flow in 
both directions (Figure 5, Additional File 6). 

In some cases the estimates of gene flow between two 
caves or cave clusters appear asymmetric. Considering 
both the Sierra de El Abra and the Guatemala, these 
asymmetries seemed related to relative altitudes. Figure 
5 shows the altitudes above sea level of the fish pools in 
the various caves; N2 (175 m) sent more migrants to Nl 
(125 m) than vice versa. The same is true for Ol (202 
m) to 02/03 (147/153 m), 02/03 to 04/05 (62/84 m), 
and 07 (88 m) to 04/05. Thus, we suggest it is easier 
for migrants to move downstream than upstream. 

It must be noted that many of the estimates of migra- 
tion rates are associated with large error terms (Addi- 
tional File 6) and are not precise. Nevertheless, the 
overall trends discussed above seem clear. 

Relationship between eye phenotype and individual 
admixture proportions 

In order to understand the integration of the surface 
individuals into the cave in our populations we com- 
pared phenotype and genotype for individuals collected 
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Figure 5 Summary of the estimates of gene flowbased on Bayesian inferences of migration rates and population sizes using 
MIGRATE-N 3.2.6 among Astyanaxs mexicanus population clusters within each geographical region. The arrows represent directions of 
migration and the thicknesses are proportional to the M (the ratio of immigration rate and mutation rate). Populations are coded as follows: El 
Abra caves (01 - 07); Guatemala (N1 - N2); Micos (N3), Chica (08); Surface (S1 - S4). Asterisk denotes mixed populations. 
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from the three caves with mixed populations (08, N2 
and N3). The phenotype we used was relative eye size 
and the genotypic designations for each of the 26 loci 
were obtained from the STRUCTURE analyses (Figure 
6). Hybrids between cave and surface formes are inter- 
mediate in phenotype between the two [6]. Our results 
largely represent sorting of the phenotype and genotype 
into the two main categories, surface and cave. In addi- 
tion, however, we also observe that there are individuals 
that are in intermediate states in both genotype and 
phenotype, the expected state for hybrids between sur- 
face and cave [9,39,43]. 

Discussion 

The Origins of the Cave Populations 

Our data clearly show that the populations of cave 
adapted Astyanax in NE Mexico are derived from two 
separate stocks. Previous studies using microsatellites 
and mtDNA markers had also concluded that the cave 
populations were derived from at least two surface 
stocks [14-17]. Our results clarify the affinities of the 
Pachon (Ol) and Chica (08) cave populations. Pachon 
was previously placed with the new stock based on 
mtDNA data, but our extensive nuclear DNA data set 
clearly places it with the old stock. Conclusions based 
on mtDNA may be misleading because the presence of 
surface fish in cave populations allows for the possibility 
of the introgression of surface mitochondria [16]. The 
affinities of the Chica population are discussed below. 
Finally, the present study covers the full geographic 
range of the cave populations and reveals no evidence 
that the cave populations are derived from more than 
two clades. 

Although derived from only two separate stocks, there 
are clearly more than two subterranean invasions that 
established the extant cave populations. All of our struc- 
turing analyses support divergence among five groups 
and are in accord with the hypothesis that the cave 
populations Nl to N3 were established much later than 
the populations Ol to 08. Similarities in the microsatel- 
lite allele frequencies in the "new" cave populations 
(Molino, Caballo Moro, and Micos, Nl to N3, in order) 
and surface populations also confirm that these popula- 
tions have recently diverged (Additional File 1). With 
the exceptions of Pachon (Ol) and Chica (08), the 
shared allele analysis shows that the El Abra populations 
cluster tightly. In the case of Pachon (Ol) the diver- 
gence is minor and it is much closer to the old (El 
Abra) cluster than to the new cave cluster. In contrast, 
the Chica population (08) is not obviously aligned with 
either cluster in the shared allele analysis. 
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Figure 6 Correlations between genotype and phenotype in 
three mixed cavefish populations. Each point represents an 
individual fish. Phenotype is represented by relative eye size and 
genotype as the admixture proportions from the STRUCTURE 
analysis. A represents 08, B represents N2, C represents N3. 
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The origin of the Chica population (08) has been a 
long standing question in the Astyanax literature [7]. 
Our data strongly suggest that the Chica cave originated 
from old stock. This interpretation contrasts with a pre- 
vious one based on mtDNA and a small number of 
microsatellite loci which suggested that it is phylogeneti- 
cally young and originated from new stock [16,44]. If 
Chica were phylogenetically young, however, the 
STRUCTURE analysis should cluster it with the surface 
populations, a result not observed. Furthermore, we 
should see lower F ST values between Chica and the new 
cave populations (Nl through N3) than between Chica 
and the other El Abra populations (Ol through 07), but 
the opposite is the case (Chica vs. new: average F ST = 
0.297 ± 0.041 SEM; Chica vs. other El Abra: average F ST 
= 0.230 ± 0.021) (Table 3). Considering the F ST values, 
the STRUCTURE analysis, and the shared allele distance 
analysis (Table 3 Figure 2and Figure 3A), all of which 
show Chica to be considerably differentiated from the 
rest of the El Abra populations, we suggest that it was 
derived from an independent invasion of old stock. 
Because of its southernmost location, it may well be the 
earliest established of the cave populations. 

Geology and Geography 

Knowledge of geology and geography, as well as genet- 
ics, is needed to understand the pattern of independent 
invasions of the underground that established the extant 
populations. A clear pathway through surface waters 
from the southernmost end of the El Abra all the way 
to the area of Pachon cave existed in the past but at 
present a surface divide separates the ends of the valley 
[7] (Additional File 4). Pachon cave (Ol) at the northern 
end of the El Abra is 46 km north of Yerbaniz cave 
(02). While there is at least one other known cave 
between the two that might have served as a stepping 
stone, it seems likely that the underground invasion that 
established the Pachon cave population was independent 
of those that established the more southern populations. 
This argument is based on the expectation that travel 
from one region to another is much faster through sur- 
face streams than through subterranean passages 
because open waters contain abundant food and provide 
direct passage, while subterranean routes have low food 
reserves and their passages may be maze-like. Surface 
fish can move into caves relatively easily and quickly. 
We constantly see surface Astyanax and other surface 
species, including Tilapia, in certain caves, such as Yer- 
baniz (02), Chica (08) and Micos (N3). The significance 
of Tilapias presence is that it was introduced into Mexi- 
can waters and only became common in the late 1980's 
[45]. Therefore, its presence in caves shows how quickly 
underground populations may be seeded from the sur- 
face. Thus, for the most distal populations of a 



migratory wave, it is far likelier that surface migrants 
will have reached and colonized a cave long before the 
arrival of underground migrants from the same source. 
All seven F$x values between Pachon and the other El 
Abra populations are significant (Table 3), which reflects 
the current isolation of the cave and, perhaps, a past 
independent origin. 

Considering the new cave populations, the distance 
between the Micos (N3) cave and the closest of the 
Guatemala caves, Caballo Moro (N2), is over 90 km and 
there is one ridge and two open valleys between them. 
No documented underground route currently exists 
between the two regions. Thus, the Micos and Guate- 
mala cave clusters likely represent separate invasions. 

In summary, we suggest a model with a minimum of 
five independent origins of cave adapted Astyanax in 
NE Mexico. We envision that the area was originally 
colonized by surface Astyanax of the old stock which 
independently established cave populations in the south 
of the El Abra (08), in central El Abra (02 - 07), and 
in its north (Ol). Subsequent to this, the surface stock 
went extinct locally and was eventually replace by sur- 
face fish of the new stock. These gave rise to cave popu- 
lations in the two geographically distinct regions of the 
Gautemala (Nl and N2) and Micos (N3) (Additional 
File 7). 

Allelic diversity, migration and gene flow 

Allelic diversities were generally lower in cave popula- 
tions than in surface populations (Table 1), an observa- 
tion in accord with previous studies on this species and 
other fishes [14,16,46-48]. Lower genetic diversity in 
cave populations than in related surface populations 
probably reflects smaller effective population sizes 
because of food and space limitations, but may also 
reflect possible bottleneck events due to periodic 
droughts and other environmental fluctuations [7]. It 
should be noted, however, that the relatively large effec- 
tive population sizes in Micos (N3) and Chica (08) 
were probably overestimated by MIGRATE-N 3.2.6 
because they are admixed with the surface populations. 

Many of the El Abra caves regularly receive migrants 
from the surface [7,48], and Chica (08) is the best 
known of these [40]. Chica is unusual among Astyanax 
cave populations in receiving a high energy input deriv- 
ing primarily from two bat roosts located directly above 
the largest of the fish pools. Breder noted, and we still 
observe today, that the frequency of surface fish in the 
pools increases as one goes deeper into the cave, and is 
highest in Pool 4, at the level of the aquifer and located 
about one km from the Rio Tampaon [40]. All who 
have studied this cave have surmised that surface fish 
get into the cave from the river through the aquifer and 
are able to survive and breed there because of the high 
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energy input from the bat roosts and from debris 
washed into the cave during the rainy season [7,40]. 
Thus, Chica draws its occupants from two different 
source populations that are well differentiated from each 
other. This admixture results in significant heterozygote 
deficiencies at numerous loci. That these departures 
from HWE are due to Wahlund effect is evident from 
genotype-phenotype correlations observed in our study 
(Figure 6). 

Our collections from the Micos cave (N3) also con- 
tained both cave and surface forms and, as in Chica 
(08), we observed departures from HWE due to Wah- 
lund effects. In contrast to the situation in Chica, food 
is not abundant in this cave, thus the surface fish are 
prone to starvation, leading in most cases to reduced fit- 
ness and inefficient mating [7]. Nevertheless, some sur- 
face fish washed into this cave may hybridize with the 
cave population, as revealed by genotype-phenotype cor- 
relations (Figure 6). The Caballo Moro (N2) population 
exhibits a full range of eye sizes and pigmentation, from 
typical cave to typical surface morphs (Figure 6). This 
population is in a karst window, a habitat within a cave 
exposed to light because of passage collapse; the pre- 
sence of light facilitates the continued survival of surface 
and hybrid phenotypes [7,49]. 

The MIGRATE-N analysis also detected relatively high 
rates of gene flow from the Pachon cave population 
(Ol) to their nearby surface populations, supporting an 
earlier suggestion of a route for alleles from cave to sur- 
face [48] (Figure 5, Additional File 6). Estimation of 
migration rates and effective population sizes supported 
the hypothesis that the genetic diversity of A. mexicanus 
cave populations is correlated with the influx of alleles 
from surface populations, as well as by their effective 
population sizes [48]. The relatively high rates of migra- 
tion between cave and surface populations here may not 
be a rule for cavefish. For example, migration between 
cave and surface populations of Poecilia sulphur aria is 
relatively low, even though there are few physical bar- 
riers to movement [5]. In the case of Poecilia, the bar- 
rier seems to be the extreme environment of the 
sulphidic caves, which requires physiological adaptation 
to high levels of H 2 S, a condition to which cave Astya- 
nax are not exposed. 

The migration rate analysis revealed that surface fish 
in the region form a metapopulation, with extensive 
exchange of genetic material among its component 
populations. Thus, there is high genetic diversity within 
and little genetic differentiation among surface popula- 
tions. In strong contrast, cave populations live under 
dramatically different ecological conditions and often 
have lower population densities. MIGRATE-N results 
also show that the effective sizes of surface populations 
are generally larger than those of cave populations, 



consistent with earlier studies based on estimates of 
nucleotide diversity [48] (Figure 4). Mark and recapture 
estimates of total population sizes from Pachon (Ol) 
and Yerbaniz (02) caves were similar to our estimates, 
with averages of 8.5 x 10 3 individuals and broad 95% 
confidence intervals ranging from about 1.5 x 10 3 to 
17.0 x 10 3 [7]. Our estimates of N e in cave population 
varied from 2.8 to 7.3 x 10 3 , with the exception of 
Curva (06) and the admixed populations Micos (N3) 
and Chica (08). While consistent with the estimates 
from mark-recapture studies [7], they are around one 
order of magnitude higher than previously reported esti- 
mates from molecular data [48]. 

We note that the mutation-scaled immigration rate 
(M) from surface populations into cave populations 
often exceeds 1.0 (Additional File 6). With mutation- 
scaled effective population sizes (0) on the order of 0.5 
to 5, mN e (0*M/4) can exceed 1.0, implying that migra- 
tion from surface to cave populations could significantly 
affect allelic frequencies at neutral loci [50]. Neverthe- 
less, cavefish in these populations remain troglomorphic 
in phenotype in the face of this immigration. This 
implies that these phenotypes are maintained by selec- 
tion, although we cannot say whether it is natural selec- 
tion imposed by the cave environment or sexual 
selection imposed by mate choice biases against the sur- 
face fish [5,51]. Selection may generally be sufficiently 
powerful to allow population differentiation even in 
situations in which there is high gene flow [52]. 

Finally, we note that the five independent invasions of 
the subterranean habitat documented here imply five 
instances of striking phenotypic convergence. This high- 
lights the importance of a change in ecology as a strong 
driver of evolutionary change. This is in accord with 
studies of freshwater adaptation in Gasterosteus aculea- 
tus that document widespread convergences or paralle- 
lisms related to ecological shifts [53]. 

Conclusions 

Our study showed that cave populations of Astyanax 
mexicanus generally have significantly lower genetic 
variability than surface populations, reflecting the gener- 
ally lower availability of habitat space and food in the 
caves. Some of the cave populations were exceptional 
and had higher genetic diversity, which correlated with 
their receiving relatively high migration from the sur- 
face. We documented significant levels of gene flow 
between surface and cave populations in both directions. 
That cave populations could maintain a cave specific 
phenotypic suite of traits in the face of strong migration 
from the surface implies strong selection for mainte- 
nance of cave phenotype. The results also demonstrate 
that cave populations in the region studied arose at least 
five times independently and derive from two different 
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ancestral stocks, implying numerous convergences on 
the cave phenotype driven by the ecological shift from 
surface to the underground. Thus, the Astyanax cavefish 
model will continue to be a rich source for study of 
adaptive evolution. 
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